Memory-based fft/ifft processor and design method for general sized memory-based fft processor

ABSTRACT

For a large size FFT computation, this invention decomposes it into several smaller sizes FFT by decomposition equation and then transform the original index from one dimension into multi-dimension vector. By controlling the index vector, this invention could distribute the input data into different memory banks such that both the in-place policy for computation and the multi-bank memory for high-radix structure could be supported simultaneously without memory conflict. Besides, in order to keep memory conflict-free when the in-place policy is also adopted for I/O data, this invention reverses the decompose order of FFT to satisfy the vector reverse behavior. This invention can minimize the area and reduce the necessary clock rate effectively for general sized memory-based FFT processor design.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to a memory-based FFT/IFFT processor and designmethod for general sized memory-based FFT processor to minimize the areaand reduce the necessary clock rate.

2. Description of the Related Art

A. Prior Arts List: 1. USA Patent:

Pat. No. Title [A1] 4,477,878 Discrete Fourier transform withnon-tumbled output [A2] 5,091,875 Fast Fourier transform (FFT)addressing apparatus and method [A3] 7,062,523 Method for efficientlycomputing a fast Fourier transform [A4] 7,164,723 Modulation apparatususing mixed-radix fast Fourier transform [A5] 20060253514 Memory-basedFast Fourier Transform device (Publication No.) [A6] 20080025199 Methodand device for high throughput n-point (Publication No.) forward andinverse fast Fourier transform

2. China Patent:

Pat. No. Title [A7] 01140060.9 The architecture for 3780-point DFTprocessor [A8] 03107204.6 The multicarrier systems and method with3780-point IDFT/DFT processor [A9] 200410090873.2 The oversamplingmethod for 3780-point DFT (Publication No.) [A10] 200610104144.7 The3780-point DFT processor (Publication No.) [A11] 200710044716.1 Thewater-flowed 3780-point FFT processor (Publication No.)

3. Articles

-   -   [B1] Z.-X. Yang, Y.-P. Hu, C.-Y. Pan, and L. Yang, “Design of a        3780-point IFFT processor for TDS-OFDM,” IEEE Trans. Broadcast.,        vol. 48, no. 1, pp. 57-61, March 2002.    -   [B2] L. G. Johnson “Conflict free memory addressing for        dedicated FFT hardware,” IEEE Trans. Circuits Syst. II, Analog        Digit. Signal Process., vol. 39, no. 5, pp. 312-316, May 1992.    -   [B3] B. G. Jo, and M. H. Sunwoo, “New continuous-flow        mixed-radix (CFMR) FFT processor using novel in-place strategy,”        IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no.5, pp.        911-919, May 2005.

B. Description of Prior Arts

(1) [A1] could not support the multi-bank memory structure. Hence, forthe radix-r computation, it would cost r clock cycles to read data frommemory and write the computed data back to memory. This would result inthe FFT needs more computation cycle and thus demand higher clock ratefor the processor for real-time application. This invention could solvethis problem by supporting the multi-bank addressing without memoryconflict such that the r data for radix-r could be accessed in one clockcycle.

(2) [A2], [A5], [B2] could only support the fixed radix-r. Hence, itcould only be applied in the FFT with the size N=r If we consider theapplication that the 3780-point FFT for the Chinese DTV application orthe 3072-point FFT for the PLC application, they would not work here.This invention could support any general mixed radix such that it couldwork for any size FFT application.

(3) [A3] could only support the fixed radix-r, hence, it could notsupport the Chinese DTV or PLC etc application. Besides, since it couldnot support multi-bank memory structure, it would need r clock cyclesfor data access from memory for radix-r computation. Then, it would needhigher clock rate for FFT computation than the processor that takesmulti-bank memory structure. This invention could not only supportvariable radix for any size FFT application but also support multi-bankmemory structure to reduce necessary clock rate without memory conflict.

(4) [A4], [B3] could only support the radix-2/4 algorithm. Therefore, itcould only work for the FFT with the size N=2^(n). And for the FFTapplication with the other size such as for the Chinese DTV with N=3780,it would not work. However, this invention could work for all of themsince this invention could support any mixed radix. Besides, for thelong size FFT processor design such as N=8192, this invention could makethe processor design more flexible since the max radix [4] could supportis only radix-4 and this invention could support is greater thanradix-4.

(5) [A6] describes some candidate decomposition of 3780, for instance,3780=3×3×3×2×2×5×7. It implements each small size FFT module with theMDC structure to eliminate some large internal buffer in [A7]-[A11].But, this would cost more hardware since it should finish all thecomputation within one clock cycle for each module. Besides, for realsystem application, the in order output data is necessary. However, itsoutput data are not in order output.

(6) [A7], [A8], [A9], [A10], [A11] implement the 3780-point FFTprocessor with some architecture which is similar to pipeline. Theirarchitecture need large internal buffer to reorder the data forprocessing. Besides, for the real system application requirement, the inorder I/O data and to support continuous data flow are both necessary.In order to achieve this, [A7], [A8] needs at least 3N words; [A9],[A11] needs at least 5N words; [A10] needs 6N words memory size. Thisinvention could achieve this requirement only with 2N memory words. Notethat, the output data of [A7], [A8] and [A11] are not in order output,and they need “at least” one N words memory to reorder the output dataas in order output.

(7) The output data of the 3780-point FFT processor proposed in [B1] isnot in order. In order to achieve this, it would need a buffer toreorder the output data. Hence, the design would need memory size morethan 4N words to achieve continuous data flow and in order I/O data.However, this invention only needs 2N words memory.

Classification of the Prior arts [A1] [A2] [B2] [A5] [A3] [A4] [B3]Radix *1 All general Fixed radix-r Fixed radix-r Radix-2/4 In-place *2Yes Yes Yes Yes Memory *3 2N words 2N words 2N words 2N words Multi-bank*4 No Yes No Yes *1 The Tab “Radix” means that the butterfly size thatthe prior patents could support. *2 The Tab “In-place” means thatwhether the prior patents support the in-place policy to reduce thenecessary memory size. *3 The Tab “Memory” means that the necessarymemory size for the real-time application of the prior patents. *4 TheTab “Multi-bank” means that whether the prior patent could support themulti-bank memory structure to reduce the necessary clock rate.

SUMMARY OF THE INVENTION

To overcome the drawbacks of the prior arts, this invention proposes adesign method of general sized memory-based FFT processor, comprisingthe steps of: (1) Decomposing a large size FFT computation into severalrelative smaller sizes FFT by decomposition equation and then transformthe original index from one dimension into multi-dimension vector; (2)Controlling the index vector to distribute the input data into differentmemory banks such that both the in-place policy for computation and themulti-bank memory for high-radix structure could be supportedsimultaneously without memory conflict; (3) In order to keep memoryconflict-free when the in-place policy is also adopted for I/O data,reverses the decompose order of FFT to satisfy the vector reversebehavior; through the steps to minimize the area and reduce thenecessary clock rate. This invention also proposes a memory-basedFFT/IFFT processor comprising: a main memory to store data; a processingelement for decomposed smaller sized FFT computation; a control unit tocontrol: (1) the memory for I/O data and butterflies computation (2) thecomputation order of smaller sized FFT (3) the memory addressing fordata access with in-place policy.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. The design block of FFT processor of this invention.

FIG. 2. The index vector generator hardware implementation of thisinvention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

For the best understanding of this invention, please refer to thefollowing detailed description of the preferred embodiments and theaccompanying drawings.

The definition of DFT is

${X(k)} = {\sum\limits_{n = 0}^{N - 1}{{x(n)}{W_{N}^{nk}.}}}$

The method to decompose a larger size DFT into several smaller sizes DFThas been proposed in article C1 [C. Burrus, “Index mappings formultidimensional formulation of the DFT and convolution,” IEEE Trans.Acoust., Speech, Signal Process., vol. 25, no. 3, pp. 239-242, June1977]. Here, the present invention takes the decomposition equation (1)for the size N=N₁N₂ for illustration:

$\begin{matrix}\left\{ \begin{matrix}{n = {{N_{2}n_{1}} + {A_{2}n_{2\mspace{14mu}}{mod}\mspace{14mu} N}}} & {n_{1},{k_{1} = 0},1,\ldots \mspace{11mu},{N_{1} - 1}} \\{k = {{B_{1}k_{1}} + {N_{1}k_{2}\mspace{14mu} {mod}\mspace{14mu} N}}} & {n_{2},{k_{2} = 0},1,\ldots \mspace{11mu},{N_{2} - 1}}\end{matrix} \right. & (1)\end{matrix}$

Eq. (1) maps the index n and k to the index vector (n₁, n₂) and (k₁,k₂), from one dimension [0, N−1] into two dimensions [0, N₁−1]×[0,N₂−1]. The selection of coefficient A₂ and B₁ depends on the relationbetween N₁ and N₂. In this invention, the coefficients for Eq. (1) areemployed as follows:

Case I: If N₁ and N₂ are relative prime, we choose A₂ and B₁ such that

A ₂ =p ₁ N ₁ and A ₂ =q ₁ N ₂+1

B ₁ =P ₂ N ₂ and B ₁ =q ₂ N ₁+1

Here, p₁, q₁, p₂ and q₂ are all positive integers. Then, the definitionof DFT could be rewritten as Eq. (2):

$\begin{matrix}\begin{matrix}{{X\left( {k_{1},k_{2}} \right)} = {\sum\limits_{n_{2}}^{\;}{\sum\limits_{n_{1}}^{\;}{{x\left( {n_{1},n_{2}} \right)}W_{N_{1\;}}^{n_{1}k_{1}}W_{N_{2}}^{n_{2}k_{2}}}}}} \\{= {\sum\limits_{n_{2}}^{\;}{\left\{ {\sum\limits_{n_{1}}^{\;}{{x\left( {n_{1},n_{2}} \right)}W_{N_{1\;}}^{n_{1}k_{1}}}} \right\} W_{N_{2}}^{n_{2}k_{2}}}}} \\{= {\sum\limits_{n_{2}}^{\;}{{y\left( {k_{1},n_{2}} \right)}W_{N_{2}}^{n_{2}k_{2}}}}}\end{matrix} & (2)\end{matrix}$

Case II: If N₁ and N₂ are not relative prime, we choose A₂=B₁=1. Then,the definition of DFT could be rewritten as Eq. (3):

$\begin{matrix}\begin{matrix}{{X\left( {k_{1},k_{2}} \right)} = {\sum\limits_{n_{2}}^{\;}{\sum\limits_{n_{1}}^{\;}{{x\left( {n_{1},n_{2}} \right)}W_{N_{1\;}}^{n_{1}k_{1}}W_{N_{2}}^{n_{2}k_{2}}W_{N}^{n_{2}k_{1}}}}}} \\{= {\sum\limits_{n_{2}}^{\;}{\left\{ {W_{N}^{n_{2}k_{1}}{\sum\limits_{n_{1}}^{\;}{{x\left( {n_{1},n_{2}} \right)}W_{N_{1\;}}^{n_{1}k_{1}}}}} \right\} W_{N_{2}}^{n_{2}k_{2}}}}} \\{= {\sum\limits_{n_{2}}^{\;}{W_{N}^{n_{2}k_{1}}{y\left( {k_{1},n_{2}} \right)}W_{N_{2}}^{n_{2}k_{2}}}}}\end{matrix} & (3)\end{matrix}$

Eqs. (2) and (3) imply that a larger size N-point FFT could be computedby two smaller sized N₁-point and N₂-point FFT in the first and secondstages respectively. For the fixed n₂, the input data of the N₁-pointFFT in the first stage are x(n₁, n₂), n₁=0, 1, . . . , N₁−1. The outputdata of the first stage correspond to this n₂ are y(k₁, n₂), k₁=0, 1, .. . , N₁−1. For the fixed k₁, the input data of the N₂-point FFT in thesecond stage are y(k₁, n₂), n₂=0, 1, . . . , N₂−1. The output data ofthe second stage correspond to this k₁ are X(k₁, k₂), k₂=0, 1, . . . ,N₂−1. The difference between Eq. (2) and (3) is that if N₁ and N₂ arenot relative prime, there would have twiddle factor w_(N) ^(n) ² ^(k) ¹between the first and second stages as shown in Eq. (3).

Now, the present invention considers computing N₁-point DFT in the firststage and N₂-point DFT in the second stage for the first DFT symbol. Thepresent invention distributes the input data into different memorybanks. Suppose N₂≧N₁ and the memory bank number is N₂, then, the inputdata could be distributed into N₂ memory banks by Eq. (4) to achievememory conflict-free.

bank=n ₁ +n ₂ mod N ₂   (4)

The key to achieve memory conflict-free is to distribute data intomemory bank by Eqs. (1) and (4). Once the memory bank is selected, dataneed to be addressed. Any two data in the same memory bank should bemapped to different address within the range from 0 to N₁−1. Forsimplicity, we choose Eq. (5) for data addressing.

address=n₁   (5)

After finishing computing the first DFT symbol, the data should beoutputted in order. The output index is mapped by Eq. (4). When thecomputed DFT symbol data are outputted in order, the input data of thesecond DFT symbol are also inputted in order simultaneously with thein-place policy. That is, the new input data x(i) should be put in thelocation of the output data y(i). For the computation of the second DFTsymbol, the present invention computes it with the reverse order of thefirst DFT symbol. The present invention computes the N₂-point DFT in thefirst stage and N₁-point DFT in the second stage for the second DFTsymbol. After finishing computing the second DFT symbol, the I/O dataalso adopts the in-place policy. The third DFT symbol would return tothe state of the first DFT symbol.

The following, the present invention takes the 3780-point FFT as examplefor detailed illustration, which has been adopted for the Chinese DTVapplication. The decomposition order in following is only one of thecandidate selections.

Since 3780=4×3×3×3×5×7, we could complete it by computing the 3-point,4-point, 5-point and 7-point FFT respectively. Here, the data aredistributed into 7 memory banks.

First, the present invention takes the decomposition order, 4, 3, 3, 3,5 and then 7. For the first stage (4-point FFT), the decompositionequation (6) is taken.

$\begin{matrix}\left\{ \begin{matrix}{n = {{945\; n_{1}} + {2836\; {\overset{\sim}{n}}_{2}\mspace{14mu} {mod}\; 3780}}} & {n_{1},{k_{1} = 0},1,2,3} \\{k = {{945\; k_{1}} + {4\; {\overset{\sim}{k}}_{2}\mspace{14mu} {mod}\; 3780}}} & {{\overset{\sim}{n}}_{2},{{\overset{\sim}{k}}_{2} = 0},\ldots \mspace{11mu},944}\end{matrix} \right. & (6)\end{matrix}$

It maps the index n to the vector (n₁,ñ₂) from [0, 3779] to [0, 3]×[0,944] as shown in Table I

TABLE I THE INDEX MAPPING FOR THE FIRST STAGE n₁ = 0 n₁ = 1 n₁ = 2 n₁ =3 ñ₂ = 0 x[0] x[945] x[1890] x[2835] ñ₂ = 1 x[2836] x[1] x[946] x[1891]ñ₂ = 2 x[1892] x[2837] x[2] x[947] . . . . . . . . . . . . . . . ñ₂ =944 x[944] x[1889] x[2834] x[3779]

The data in each row of Table I are the input data of each 4-point FFTThe input order is decided by the index n₁. For example, for the rowñ₂=1, the input order is (n₁, ñ₂ )=(0,1), (1,1), (2,1) and (3,1). Theycorrespond to the data x[2836], x[1], x[946] and x[1891] respectively asshown in Table I.

Since the input data of each 4-point FFT has the same index ñ₂, memoryconflict could be avoided in the first stage by distributing the datainto different memory banks by Eq. (7).

bank=n ₁+×mod7   (7)

After the first stage, the original data has been divided into 4independent groups for four 945-point FFT and correspond to k₁=0, 1, 2and 3 respectively.

Similarly, for the 945-point FFT, we could also decompose it in theorder 945=3×3×3×5×7 and map the index ñ₂ to the vector (n₂, n₃, n₄, n₅,n₆) from [0, 944] to [0, 2]×[0, 2]×[0, 2]×[0, 4]×[0, 6]. By combiningall the decomposition equations for each stage, the full index mappingequations for the 3780-point FFT in this decomposition order are shownin Eqs. (8) and (9). The bank selection Eq. (10) is taken for memoryconflict-free. And, the addressing equation could be taken as Eq. (11).

n=945n ₁+1260n ₂+2940n ₃+980n ₄+1512n ₅+540n ₆ mod3780   (8)

k=945k ₁+2380k ₂+3360k ₃+2520k ₄+2268k ₅+540k ₆ mod3780   (9)

bank=n ₁ +n ₂ +n ₃ +n ₄ +n ₅ +n ₆ mod7   (10)

address=135n ₁+45n ₂+15n ₃+5n ₄ +n ₅   (11)

After the first FFT symbol, the odd FFT symbol, has been computed, allthe indexes n_(i) would be transformed to k_(i) in Eqs. (10) and (11).For the second input FFT symbol, the even FFT symbol, since the even FFTinput data x_(even)[n] should be put in the location of the odd FFToutput data x_(odd)[k] with k=n, its mapping index, memory bank andaddress should be determined by Eqs. (9), (10) and (11) respectively.

Now, the present invention considers how to keep memory conflict-freeunder this data distribution for the even FFT symbol.

For the computation of the even FFT symbol, the present invention takesthe reverse decomposition order of the odd FFT symbol, 3780=7×5×3×3×3×4.That is, the present invention computes it in the order 7-point,5-point, 3-point, 3-point, 3-point and 4-point FFT. By the similarapproach, the present invention could get the full input and outputindex mapping Eqs. (12) and (13) for the even FFT symbols just like Eqs.(8) and (9) for the odd FFT symbols. They map the input index n andoutput index k to the vector (a₁, a₂, a₃, a₄, a₅, a₆) and (b₁, b₂, b₃,b₄, b₅, b₆) respectively from [0, 3779] to [0, 6]×[0, 4]×[0, 2]×[0,2]×[0, 2]×[0, 3] for the even FFT symbols.

n=540a ₁+2268a ₂+2520a ₃+3360a ₄+2380a ₅+945a ₆ mod3780   (12)

k=540b ₁+1512b ₂+980b ₃+2940b ₄+1260b ₅+945b ₆ mod3780   (13)

Comparing the input and output index mapping equation pair (9) with (12)and (8) with (13), the present invention finds that these two equationpairs could match perfectly with the relation, vector reverse, as shownin Eqs. (14) and (15).

(a ₁ , a ₂ , a ₃ , a ₄ , a ₅ , a ₆)=(k ₆ , k ₅ , k ₄ , k ₃ , k ₂ , k ₁)  (14)

(n ₁ , n ₂ , n ₃ , n ₄ , n ₅ , n ₆)=(b ₆ , b ₅ , b ₄ , b ₃ , b ₂ , b ₁)  (15)

Since the input data of the even FFT symbol x_(even)[n] should be put inthe location of the odd computed FFT symbol output data X_(odd)[k] withk=n. Then, the present invention gets Eqs. (16) and (17) for bankselection and memory addressing for the even FFT symbol.

bank=a ₁ +a ₂ +a ₃ +a ₄ +a ₅ +a ₆ mod7   (16)

address=135a ₆+45a ₅+15a ₄+5a ₃ +a ₂   (17)

Note the input mapping equation (12) and the output mapping equation (9)are the same. And, the bank selection equations (10) and (16) also keepthe same. Thus the memory access would keep conflict-free for the evenFFT symbol by computing it with the reverse decomposition order of theodd FFT symbol.

Furthermore, the present invention notes that the output mappingequation (13) and the input mapping equation (8) are also the same. Thismeans that, when the third FFT input data x_(third)[n] is put in thelocation of the even FFT output data X_(even)[k] with k=n, thedistribution of the third FFT input data would be the same with thatbeing determined by Eqs. (8), (10) and (11). It returns to the state ofthe odd FFT symbol and we have x_(third)[n]=x_(odd)[n].

From above discussion, a conflict-free variable radix FFT processorcould be designed by reversing the decomposition order of the previousFFT symbol. It could support the in-place policy for both butterfliesoutput and I/O data simultaneously.

FIG. 1 shows the memory-based 3780-point DFT processor design blockdiagram. The MEM_1 and MEM_2 are two memory blocks, each block contains7 memory banks and each bank has 540 words. The FFT_CORE contains theprocessing element which could process the smaller sized DFTindividually. The control unit controls the data access and processingelement computation. The following, the present invention describes howthe DFT processor work. For convenience, the present invention denotesthe decomposition and computation order as type I order: 4-point,3-point, 3-point, 3-point, 5-point and then 7-point DFT. And, thepresent invention denotes the reverse order of type I as type II order:7-point, 5-point, 3-point, 3-point, 3-point and then 4-point DFT.

Suppose the 1^(st) and the 2^(nd) DFT symbols are stored in the memoryblocks MEM_1 and MEM_2 respectively. And suppose the decomposition andcomputation order of the 1^(st) and the 2^(nd) DFT symbol are both inthe type I order. Then, from the previous discussion, the presentinvention has:

(1) the 1^(st), 5^(th), 6^(th), 13^(th), . . . DFT symbols are stored inMEM_1 and computed in type I order.

(2) the 2^(nd), 6^(th), 10^(th), 14^(th), . . . DFT symbols are storedin MEM_2 and computed in type I order. (3) the 3^(rd), 7^(th), 11^(th),15^(th), . . . DFT symbols are stored in MEM_1 and computed in type IIorder. (4) the 4^(th), 8^(th), 12^(th), 16^(th), . . . DFT symbols arestored in MEM_2 and computed in type II order.

For convenience, for the hardware implementation to get the index vectorfor data access, the present invention only shows the implement for theDFT size N=N₁N₂N₃ in FIG. 2. As shown in FIG. 2, it is consists ofseveral accumulators A₁, A₂, . . . , A₅. The parameters U₁, U₂, U₃, U₄,q and r are determined by the decomposition equation (1) for each stage.

Table II is the comparison of this invention between differentapproaches for the real-time application for the memory-based DFTprocessor design. For [B4], it needs 4N words memory size and is twotimes of the others. This is because it doesn't adopt the in-placepolicy for butterflies computation and I/O data. Hence, in order tominimize the necessary memory size, the in-place policy should beadopted for both butterflies computation and I/O data. Exclusive of[B4], all of the others, only this invention and [A1] can support themixed of any general radix and then for any size FFT. However, [A1]could not support the multi-bank memory structure without memoryconflict to reduce the necessary clock rate for real-time application.

In this invention, the inventors propose the approach for memory basedFFT processor design such that it could support the following threeitems simultaneously: (1) the in-place policy for both butterfliescomputation and I/O data (2) the mixed of any radix (3) the multi-bankmemory structure.

TABLE II THE COMPARISON BETWEEN DIFFERENT APPROACH [A1] [A2] [B2] [A5][A3] [A4] B[3] This invention Radix All general Fixed radix-r Fixedradix-r Radix-2/4 All general In-place Yes Yes Yes Yes Yes Memory 2Nwords 2N words 2N words 2N words 2N words Multi-bank No Yes No Yes Yes

When this invention apply the proposed for the 3780-point DFT in ChineseDTV application, this invention could use only 2N words memory toachieve the system requirement, continuous data flow and in order I/Odata. However, in order to achieve this, the solution [A6]-[A8] needs atleast 3N words, [A9], [A11] needs at least 5N wo [A10] needs 6N words,and [B1] needs more than 3N words. Hence, for this application, if the3780-point DFT processor is designed with this invention, the chip areacould be reduced significantly than the processor which is designed withany current solution.

Although a preferred embodiment of the invention has been described forpurposes of illustration, it is understood that various changes andmodifications to the described embodiment can be carried out withoutdeparting from the scope of the invention as disclosed in the appendedclaims.

1. A design method of general sized memory-based FFT processor,comprising the steps of: (1) Decomposing a large size FFT computationinto several relative smaller sizes FFT and then transform the originalindex from one dimension into multi-dimension vector; (2) Controllingthe index vector to distribute the input data into different memorybanks such that both the in-place policy for computation and themulti-bank memory for high-radix structure could be supportedsimultaneously without memory conflict; (3) In order to keep memoryconflict-free when the in-place policy is also adopted for I/O data,reverses the decompose order of FFT to satisfy the vector reversebehavior; through the steps to minimize the area and reduce thenecessary clock rate.
 2. A memory-based FFT/IFFT processor comprises: amain memory to store data; a processing element for decomposed smallersized FFT computation; a control unit to control: (1) the memory for I/Odata and butterflies computation (2) the computation order of smallersized FFT (3) the memory addressing for data access with in-placepolicy.
 3. The memory-based FFT/IFFT processor as claimed in claim 2,wherein the main memory has two memory blocks MEM block1 and MEM block2,when MEM block1 is used for FFT computation, then, MEM block2 would beused for I/O data, and vice versa.
 4. The memory-based FFT/IFFTprocessor as claimed in claim 3, wherein each of the memory blocksconsists of M memory banks and the size of each bank is N/M; Wherein, Nis the FFT size and M is the bank number which is determined bydesigner.
 5. The memory-based FFT/IFFT processor as claimed in claim 2,wherein the processing element is designed such that it can compute thedecomposed smaller sized FFT individually.
 6. The memory-based FFT/IFFTprocessor as claimed in claim 2, wherein the function (1) of the controlunit controls the memory blocks, which is described in claim 3, toexchange their function for FFT computation or for I/O data.
 7. Thememory-based FFT/IFFT processor as claimed in claim 2, wherein thefunction (2) of the control unit controls the processing element tocompute the FFT symbol with smaller size FFT in the reversedecomposition order of the previous FFT symbol for the same memoryblock; that is, for each memory block, if the FFT symbol in one memoryblock is computed in the order N₁-point FFT, N₂-point FFT, . . . , andthen N_(k)-point FFT, the computation order of the next FFT symbol,which is stored in this memory block, would be N_(k)-point FFT,N_((k−1))-point FFT, . . . , and then N₁-point FFT.
 8. A memory-basedFFT/IFFT processor as claimed in claim 2, the function (3) of thecontrol unit controls the data access with in-place policy for both thebutterflies computation and the I/O data for each memory blocks.
 9. Thememory-based FFT/IFFT processor as claimed in claim 8, wherein theaccessed data would appear in which memory bank is determined by theindex vector (n₁, n₂, . . . , n_(k)) and the equation (a₁), where, c inEq. (a₁) is a constant integer, the index vector is corresponding toeach smaller sized FFT, N₁-point FFT, N₂-point FFT, . . . , andN_(k)-point FFT, which is described in claim 7;bank=n ₁ +n ₂ + . . . +n _(k) +c mod M.   (a₁)
 10. The memory-basedFFT/IFFT processor as claimed in claim 8, the address of accessed datais determined by the index vector (n₁, n₂, . . . , n_(k)) and equation(a₂): $\begin{matrix}{{address} = {{\sum\limits_{i = 1}^{k - 2}{\left( {\prod\limits_{j = {i + 1}}^{k - 1}\; U_{j}} \right)u_{i}}} + {u_{k - 1}\mspace{14mu} {{mod}\left( {N/M} \right)}}}} & \left( a_{2} \right)\end{matrix}$ Here, N=N₁N₂ . . . N_(k) and we take M=N_(t) is one of{N₁, N₂, . . . , N_(k)}. And, (U₁, U₂, . . . , U_(k−1))=(N₁, N₂,N_(t−1), N_(t+1), . . . , N_(k)), (u₁, u₂, . . . , u_(k−1))=(n₁, n₂, . .. , n_(t−1), n_(t+1), . . . , n_(k)).
 11. The memory-based FFT/IFFTprocessor as claimed in claim 9 wherein the index vector was generatedby the decompose equations (a₃) and (a₄) in each stage with N=N₁N₂. Theinput and output index are transformed by the input and output equations(a₃) and (a₄) respectively;n=N ₂ n ₁ +A ₂ n ₂ mod N n ₁ , k ₁=0, 1, . . . , N ₁−1   (a₃)k=B ₁ k ₁ +N ₁ k ₂ mod N n ₂ , k ₂=0, 1, . . . , N ₂−1   (a₄).
 12. Thememory-based FFT/IFFT processor as claimed in claim 10, wherein theindex vector was generated by the decompose equations (a₃) and (a₄) ineach stage with N=N₁N₂. The input and output index are transformed bythe input and output equations (a₃) and (a₄) respectively;n=N ₂ n ₁ +A ₂ n ₂ mod N n ₁ , k ₁=0, 1, . . . , N ₁−1   (a₃)k=B ₁ k ₁ +N ₁ k ₂ mod N n ₂ , k ₂=0, 1, . . . , N ₂−1   (a₄).